Sound Event Recognition in Unstructured Environments using Spectrogram Image Processing

نویسنده

  • Jonathan William Dennis
چکیده

The objective of this research is to develop feature extraction and classification techniques for the task of sound event recognition (SER) in unstructured environments. Although this field is traditionally overshadowed by the popular field of automatic speech recognition (ASR), an SER system that can achieve human-like sound recognition performance opens up a range of novel application areas. These include acoustic surveillance, bio-acoustical monitoring, environmental context detection, healthcare applications and more generally the rich transcription of acoustic environments. The challenge in such environments are the adverse effects such as noise, distortion and multiple sources, which are more likely to occur with distant microphones compared to the close-talking microphones that are more common in ASR. In addition, the characteristics of acoustic events are less well defined than those of speech, and there is no sub-word dictionary available like the phonemes in speech. Therefore, the performance of ASR systems typically degrades dramatically in these challenging unstructured environments, and it is important to develop new methods that can perform well for this challenging task. In this thesis, the approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions. This enables novel methods for SER to be developed based on spectrogram image processing, which are inspired by techniques from the field of image processing. The motivation for such an approach is based on finding an automatic approach to “spectrogram reading”, where it is possible for humans to visually recognise the different sound event signatures in the spectrogram. The advantages of such an approach are twofold. Firstly, the sound event image representation makes it possible to naturally capture the sound information in a two-dimensional feature. This has advantages over conventional onedimensional frame-based features, which capture only a slice of spectral information

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sound Event Recognition and Classification in Unstructured Environments

The objective of this research is to develop feature extraction and classification techniques for the task of acoustic event recognition (AER) in unstructured environments, which are those where adverse effects such as noise, distortion and multiple sources are likely to occur. The goal is to design a system that can achieve human-like sound recognition performance on a variety of hearing tasks...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

Overlapping Sound Event Recognition using Local Spectrogram Features with the Generalised Hough Transform

We present a novel approach for recognition of overlapping sound events based on the Generalised Hough Transform (GHT) – a technique commonly used for object recognition in the domain of image processing. Unlike our previous work on image-based sound event classification, where we focussed on global image features, here we extract local features from detected interest-points in the spectrogram....

متن کامل

Analysis of spectrogram image methods for sound event classification

The time-frequency spectrogram representation of an audio signal can be visually analysed by a trained researcher to recognise any underlying sound events in a process called “spectrogram reading”. However, this has not become a popular approach for automatic classification, as the field is driven by Automatic Speech Recognition (ASR) where frame-based features are popular. As opposed to speech...

متن کامل

Robust sound event classification using LBP-HOG based bag-of-audio-words feature representation

This paper addresses the problem of sound event classification, focusing on feature extraction methods which are robust in noisy environments. In real world, sound events can be easily exposed in a noisy situation causing corruption of distinctive temporal and spectral characteristics. Therefore, extracting robust features to represent these characteristics is important in achieving good classi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014